Back

Nucleic Acids Research

Oxford University Press (OUP)

Preprints posted in the last 7 days, ranked by how well they match Nucleic Acids Research's content profile, based on 1128 papers previously published here. The average preprint has a 0.80% match score for this journal, so anything above that is already an above-average fit.

1
Human Oncogene EWS::FLI1 Functions as a Pioneer Factor in Saccharomyces cerevisiae.

Velazquez, D.; Molnar, C.; Reina, J.; Mora, J.; Gonzalez, C.

2026-04-14 cancer biology 10.1101/2025.10.22.680884 medRxiv
Top 4%
4.9%
Show abstract

Ewing sarcoma (EwS) is an aggressive, human-exclusive tumor typically driven by the EWS::FLI1 fusion protein. To assess whether the neomorphic functions of EWS::FLI1 are fundamentally dependent on evolutionarily recent cofactors such as ETS transcription factors (ETS-TFs), Plycomb group (PcG) proteins, CBP/p300, or specific subunits of the BAF complex, we expressed EWS::FLI1 in the model organism Saccharomyces cerevisiae. This minimal system was chosen because several key EWS::FLI 's cofactors possess greatly reduced sequence homology (e.g., BAF) or are lacking altogether (e.g., ETS-TFs, PcG, or CBP/p300). We used co-IP/MS to map the yeast interactome, Chip-Seq to identify gDNA binding sequences, RNA-Seq for global gene expression, and engineered reporters to test conversion of (GGAA) tandem repeats (GGAASat) into neoenhancers. We found that the yeast EWS::FLI1 interactome was more limited and qualitatively distinct from its human counterpart, sharing core machinery (e.g. RNA Polymerase II, FACT) but lacking the BAF/SWI-SNF and spliceosome complexes, and showing strong enrichment for the SAGA chromatin remodeling complex. We also found that EWS::FLI1 binds to hundreds of sites in the yeast genome with a clear preference for putative ETS-TF consensus sequences and (CA) dinucleotide repeats. Yet, EWS::FLI1 expressing cells presented only minimal transcriptional dysregulation, a stark contrast to the extensive changes observed in humans and Drosophila cells. Finally, we found that EWS::FLI1 successfully converted silent GGAASat sequences into active enhancers in yeast. This remarkable result occurs despite the absence of homologs for key human activators, such as CBP/p300, strongly suggesting that EWS::FLI1 can mobilize functionally related, non-homologous pathways to establish neoenhancers at GGAASat sites. Altogether, our results indicate that EWS::FLI1's core ability to drive GGAASat-dependent gene expression is a conserved, ancient property, while GGAASat-independent extensive transcriptome reprogramming is dependent on co-factors and pathways specific to animal cells.

2
Dynamic Quantum Clustering of Gliomas RNA-seq Identifies Diagnostic Separation and Survival Gradients

Jahaniani, F.; Schrodi, S. J.; Weinstein, M.

2026-04-10 genetic and genomic medicine 10.64898/2026.04.09.26350535 medRxiv
Top 5%
3.6%
Show abstract

Public RNA-seq sample sets can refine per tumor diagnosis and risk, but heterogeneous biology and analytic drift often obscure structure. Dynamic Quantum Clustering (DQC), an unsupervised geometry-preserving method requiring no clinical labels or preset cluster counts, addresses both challenges. Applied to RNAseq from 692 TCGA gliomas (524 low-grade gliomas (LGG), 168 glioblastomas (GBM); 20,057 protein coding genes), DQC produced two dominant clusters with 90.9% post hoc diagnostic concordance and clear survival time separation. Filtering genes by inter-cluster mean differences yielded a 554 gene subset that improved accuracy to 97.3%. Rank ordering these genes identified ~90 genes that, under DQC, produced three LGG-pure subclusters with ordered, but different survival outcomes and one GBM-rich cluster (PPV 97.1%)--the RNA-based clustering without clinical information thereby inherently reveals molecular groupings which mirror critically important clinical features. Comparing these clusters defined four nonoverlapping gene modules and assigned four BioCoords per tumor. DQC with Biocoords recapitulated the LGG-to-GBM continuum with a mesenchymal/invasion-extracellular matrix axis exhibiting a monotonic survival gradient, illustrating how geometry-aware unsupervised learning can translate bench and computational discovery into meaningful biology-based patient stratification and prognosis.

3
GRASP: Gene-relation adaptive soft prompt for scalable and generalizable gene network inference with large language models

Feng, Y.; Deng, K.; Guan, Y.

2026-04-14 bioinformatics 10.1101/2025.10.20.683485 medRxiv
Top 7%
3.1%
Show abstract

Gene networks (GNs) encode diverse molecular relationships and are central to interpreting cellular function and disease. The heterogeneity of interaction types has led to computational methods specialized for particular network contexts. Large language models (LLMs) offer a unified, language-based formulation of GN inference by leveraging biological knowledge from large-scale text corpora, yet their effectiveness remains sensitive to prompt design. Here, we introduce Gene-Relation Adaptive Soft Prompt (GRASP), a parameter-efficient and trainable framework that conditions inference on each gene pair through only three virtual tokens. Using factorized gene-specific and relation-aware components, GRASP learns to map each pair's biological context into compact soft prompts that combine pair-specific signals with shared interaction patterns. Across diverse GN inference tasks, GRASP consistently outperforms alternative prompting strategies. It also shows a stronger ability to recover unannotated interactions from synthetic negative sets, suggesting its capacity to identify biologically meaningful relationships beyond existing databases. Together, these results establish GRASP as a scalable and generalizable prompting framework for LLM-based GN inference.

4
Efficient generation of epitope-targeted de novo antibodies with Germinal

Mille-Fragoso, L. S.; Driscoll, C. L.; Wang, J. N.; Dai, H.; Widatalla, T. M.; Zhang, J. L.; Zhang, X.; Rao, B.; Feng, L.; Hie, B. L.; Gao, X. J.

2026-04-15 synthetic biology 10.1101/2025.09.19.677421 medRxiv
Top 9%
2.1%
Show abstract

Obtaining novel antibodies against specific protein targets is a widely important yet experimentally laborious process. Meanwhile, computational methods for antibody design have been limited by low success rates that currently require resource-intensive screening. Here, we introduce Germinal, a broadly enabling generative pipeline that designs antibodies against specific epitopes with nanomolar binding affinities while requiring only low-n experimental testing. Our method co-optimizes antibody structure and sequence by integrating a structure predictor with an antibody-specific protein language model to perform de novo design of functional complementarity-determining regions (CDRs) onto a user-specified structural framework. When tested against four diverse protein targets, Germinal successfully designed functional antibodies across all targets and binder formats, testing only 43-101 designs for each antigen. Validated designs also exhibited robust expression in mammalian cells and high sequence and structural novelty. We provide open-source code and full computational and experimental protocols to facilitate wide adoption. Germinal represents a milestone in efficient, epitope-targeted de novo antibody design, with notable implications for the development of molecular tools and therapeutics.

5
A safer fluorescent in situ hybridization protocol for cryosections

Chihara, A.; Mizuno, R.; Kagawa, N.; Takayama, A.; Okumura, A.; Suzuki, M.; Shibata, Y.; Mochii, M.; Ohuchi, H.; Sato, K.; Suzuki, K.-i. T.

2026-04-16 molecular biology 10.1101/2025.05.25.655994 medRxiv
Top 12%
1.5%
Show abstract

Fluorescent in situ hybridization (FISH) enables highly sensitive, high-resolution detection of gene transcripts. Moreover, by employing multiple probes, this technique allows for multiplexed, simultaneous detection of distinct gene expression patterns spatiotemporally, making it a valuable spatial transcriptomics approach. Owing to these advantages, FISH techniques are rapidly being adopted across diverse areas of basic biology. However, conventional protocols often rely on volatile, toxic reagents such as formalin or methanol, posing potential health risks to researchers. Here, we present a safer protocol that replaces these chemicals with low-toxicity alternatives, without compromising the high detection sensitivity of FISH. We validated this protocol using both in situ hybridization chain reaction (HCR) and signal amplification by exchange reaction (SABER)-FISH in frozen sections of various model organisms, including mouse (Mus musculus), amphibians (Xenopus laevis and Pleurodeles waltl), and medaka (Oryzias latipes). Our results demonstrate successful multiplexed detection of morphogenetic and cell-type marker genes in these model animals using this safer protocol. The protocol has the additional advantage of requiring no proteolytic enzyme treatment, thus preserving tissue integrity. Furthermore, we show that this protocol is fully compatible with EGFP immunostaining, allowing for the simultaneous detection of mRNAs and reporter proteins in transgenic animals. This protocol retains the benefits of highly sensitive, multiplexed, and multimodal detection afforded by integrating in situ HCR and SABER-FISH with immunohistochemistry, while providing a safer option for researchers, thereby offering a valuable tool for basic biology.

6
Fine-Tuning PubMedBERT for Hierarchical Condition Category Classification

Wang, X.; Hammarlund, N.; Prosperi, M.; Zhu, Y.; Revere, L.

2026-04-15 health systems and quality improvement 10.64898/2026.04.13.26350814 medRxiv
Top 13%
1.3%
Show abstract

Automating Hierarchical Condition Category (HCC) assignment directly from unstructured electronic health record (EHR) notes remains an important but understudied problem in clinical informatics. We present HCC-Coder, an end to end NLP system that maps narrative documentation to 115 Centers for Medicare & Medicaid Services(CMS) HCC codes in a multi-label setting. On the test dataset, HCC-Coder achieves a macro-F1 of 0.779 and a micro-F1 of 0.756, with a macro-sensitivity of 0.819 and macro-specificity of 0.998. By contrast, Generative Pre-trained Transformer (GPT)-4o achieves highest score of a macro-F1 of 0.735 and a micro-F1 of 0.708 under five-shot prompting. The fine-tuned model demonstrates consistent absolute improvements of 4%-5% in F1-scores over GPT-4o. To address severe label imbalance, we incorporate inverse-frequency weighting and per-label threshold calibration. These findings suggest that domain-adapted transformers provide more balanced and reliable performance than prompt-based large language models for hierarchical clinical coding and risk adjustment.

7
Deriving LD-adjusted GWAS summary statistics through linkage disequilibrium deconvolution

Nouira, A.; Favre Moiron, M.; Tournaire, M.; Verbanck, M.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350574 medRxiv
Top 13%
1.3%
Show abstract

Genome-wide association studies (GWAS) have identified numerous genetic variants associated with complex traits. However, linkage disequilibrium (LD) confounds these associations, leading to false positives where non-causal variants appear associated because they are correlated with nearby causal variants. This is particularly the case in highly polygenic traits where the genome can be saturated in causal variants. To address this issue, we propose LDeconv a method based on truncated singular value decomposition (SVD) that adjust GWAS summary statistics without requiring individual-level genotype data. This approach accounts for LD structure, isolates causal variants in high-LD regions, and improve the reliability of effect size estimates. We assess its performance through simulations across various LD scenarios, conduct extensive sensitivity analyses, and apply them to real GWAS data from the UK Biobank. Our results demonstrate that LDeconv effectively reduces false discoveries while preserving true associations, offering a robust framework for post-GWAS analysis.

8
Vector2Variant: Discovery of Genetic Associations from ML Derived Representations without Phenotype Engineering

Sooknah, M.; Srinivasan, R.; Sankarapandian, S.; Chen, Z.; Xu, J.

2026-04-17 genetic and genomic medicine 10.64898/2026.04.10.26350624 medRxiv
Top 13%
1.3%
Show abstract

Genome-wide association studies (GWAS) have transformed our understanding of human biology, but are constrained by the need for predefined phenotypes. We introduce Vector2Variant (V2V), a general-purpose framework that transforms any set of high-dimensional measurements (such as machine learning embeddings) into a genome-wide scan for associations, without requiring rigid specification of a phenotype. Rather than testing genetic variants against single traits, V2V finds the axis in multivariate space along which carriers and non-carriers maximally differ, and produces a continuous "projection phenotype" that can be interpreted by association with disease labels. The projection phenotypes correlate with orthogonal clinical biomarkers never seen during training, suggesting the learned axes capture biologically meaningful variation. We applied V2V to imaging, timeseries, and omics modalities in the UK Biobank and recovered established biology (like the role of CASP9 in renal failure) without the need for targeted measurements, alongside novel associations including a frameshift variant in LRRIQ1 (potentially protective for cardiovascular disease). V2V is computationally efficient at genome-wide scale, producing summary statistics and disease associations that facilitate target prioritization without the need for phenotype engineering.

9
De novo designed bifunctional proteins for targeted protein degradation

Mylemans, B.; Korona, B.; Acevedo-Jake, A. M.; MacRae, A.; Edwards, T. A.; Huang, D. T.; Wilson, A. J.; Itzhaki, L. S.; Woolfson, D. N.

2026-04-15 synthetic biology 10.64898/2025.12.22.695915 medRxiv
Top 17%
0.8%
Show abstract

Targeted protein degradation (TPD) is a therapeutic strategy to remove disease-causing proteins by routing them to the ubiquitin-proteasome, autophagy, or lysosme machineries. For instance, proteolysis-targeting chimeras (PROTACs) are synthetic hetero-bifunctional small molecules that simultaneously bind the target and an E3 ubiquitin ligase to drive ubiquitination and degradation by the proteasome. Despite considerable success, designing such molecules is challenging and the number of currently addressable ubiquitin E3 ligases is limited. Here we demonstrate hetero-bifunctional de novo designed proteins as alternatives for TPD to access more targets and ligases. First, we develop a stable and highly adaptable helix-turn-helix scaffold for presenting different binding sites. Next, we use computational protein design to incorporate and embellish hot-spot- binding sites to target BCL-xL, plus short linear motifs (SLiMs) for KLHL20 ligase recruitment. The resulting mono- and bi-functionalised proteins bind the targets in vitro, and the latter degrade BCL-xL in cells leading to apoptosis.

10
Identification, evolutionary history and characteristics of orphan genes in root-knot nematodes

Seckin, E.; Colinet, D.; Bailly-Bechet, M.; Seassau, A.; Bottini, S.; Sarti, E.; Danchin, E. G.

2026-04-11 bioinformatics 10.64898/2025.12.19.695360 medRxiv
Top 18%
0.7%
Show abstract

Orphan genes, lacking homologs in other species, are systematically found across genomes. Their presence may result from extensive divergence from pre-existing genes or from de novo gene birth, which occurs when a gene emerges from a previously non-genic region. In this study, we identified orphan genes in the genomes of globally distributed plant-parasitic nematodes of the genus Meloidogyne and investigated their origins, evolution, and characteristics. Using a comparative genomics framework across 85 nematode species, we found that 18% of Meloidogyne genes are genus-specific, transcriptionally supported orphans. By combining ancestral sequence reconstruction and synteny-based approaches, we inferred that 20% of these orphan genes originated through high divergence, while 18% likely emerged de novo. Proteomic and translatomic evidence confirmed the translation of a subset of these genes, and feature analyses revealed distinctive molecular signatures, including shorter length, signal peptide enrichment, and a tendency for extracellular localization. These findings highlight orphan genes as a substantial and previously underexplored component of the Meloidogyne genome, with potential roles in their worldwide parasitism.

11
Quantum-Refined Latent Diffusion: A Hybrid Generative Framework for Imbalanced ECG Classification

Kritopoulos, G.; Neofotistos, G.; Barmparis, G. D.; Tsironis, G. P.

2026-04-13 cardiovascular medicine 10.64898/2026.04.09.26350502 medRxiv
Top 18%
0.7%
Show abstract

Class imbalance in clinical electrocardiogram (ECG) datasets limits the diagnostic sensitivity of automated arrhythmia classifiers, particularly for rare but clinically significant beat types. We propose a three-stage hybrid generative pipeline that combines a spectral-guided conditional Variational Autoencoder (cVAE), a class-conditional latent Denoising Diffusion Probabilistic Model (DDPM), and a Quantum Latent Refinement (QLR) module built on parameterized quantum circuits to augment minority arrhythmia classes in the MIT-BIH Arrhythmia Database. The QLR module applies a bounded residual correction guided by Maximum Mean Discrepancy minimization to align synthetic latent distributions with real class-specific latent banks. A lightweight 1D MobileNetV2 classifier evaluated over five independent random seeds and four augmentation ratios serves as the downstream benchmark. Our findings establish latent diffusion augmentation as an effective strategy for imbalanced ECG classification and motivate further investigation of quantum-classical hybrid methods in cardiac diagnostics.

12
APOE4 Allele Frequencies Show Dramatic Variation Across Indian Populations

Ramdas, S.; Kahali, B.

2026-04-13 genetic and genomic medicine 10.64898/2026.04.09.26350483 medRxiv
Top 19%
0.7%
Show abstract

The APOE {varepsilon}4 allele is the strongest genetic risk factor for Alzheimers Disease. However, its distribution across Indian populations is poorly characterized. We analyze APOE allele frequencies in 9,524 individuals from 83 distinct populations in the GenomeIndia dataset. {varepsilon}4 frequencies show large variation across populations within India, ranging from 2.7% to 36.1%, with a median of 11%. Tribal populations have higher {varepsilon}4 frequencies compared to non-tribal groups, while Tibeto-Burman populations have significantly lower frequencies. One tribal population from the northern coastal highlands has {varepsilon}4 frequency of 0.36, with 59% of individuals being carriers. {varepsilon}4 carrier status correlates significantly with lipid phenotypes including LDL, HDL, total cholesterol, and triglycerides. Collectively, these findings reveal exceptional genetic diversity in Alzheimers Disease risk across India and have important implications for population-specific screening strategies, genetic counseling, and precision medicine approaches to dementia prevention.

13
Shared inheritance reveals landscape of somatic and germline cancer risk in TP53

MacGregor, H. A. J.; Blundell, J. R.; Easton, D. F.

2026-04-11 genetic and genomic medicine 10.64898/2026.04.10.26350605 medRxiv
Top 24%
0.4%
Show abstract

Pathogenic variants in TP53, the key tumour-suppressor gene underlying Li-Fraumeni syndrome (LFS), are among the best-established causes of inherited cancer predisposition. However, large-scale sequencing has revealed that many apparently pathogenic TP53 variants detected in blood are the result of somatic clonal expansions, complicating risk interpretation. Using blood-derived whole-exome data from 469,391 UK Biobank participants, we combined variant allele fraction (VAF) with haplotype-sharing analysis to distinguish germline and somatic TP53 variants. Germline variants were concentrated at sites linked to partial loss of p53 function and lower disease penetrance, whereas classic LFS alleles appeared almost entirely somatic. High-VAF carriers of classic LFS alleles conferred markedly increased risk of haematological malignancy but not solid tumours, consistent with large TP53-mutant clonal expansions. The prevalence of somatic clonal expansion also correlated with missense variant pathogenicity, suggesting that somatic activity provides an informative in vivo proxy for functional impact. These results provide new insights into TP53-associated cancer risk at the population level, demonstrate that somatic rather than germline risk predominates in middle-aged healthy adults and provide a scalable framework for variant classification in large-scale population genomics.

14
Heterogeneous, Population-Level Drug-Tolerant Persisters Exhibit Ion-Channel Remodeling and Ferroptosis Susceptibility

Hayford, C. E.; Baleami, B.; Stauffer, P. E.; Paudel, B. B.; Al'Khafaji, A.; Brock, A.; Quaranta, V.; Tyson, D. R.; Harris, L. A.

2026-04-13 systems biology 10.1101/2022.02.03.479045 medRxiv
Top 26%
0.3%
Show abstract

Drug-tolerant persisters (DTPs) represent a major obstacle to durable responses in targeted cancer therapy. DTPs are commonly described as distinct single-cell states that survive drug treatment via reversible, non-genetic mechanisms and drive tumor recurrence. Recent work demonstrates that multiple DTPs can coexist, reflecting diversity in lineage, signaling programs, or stress responses. However, each DTP is still generally viewed as a uniform cellular phenotype. Building on our prior work describing a population-level DTP termed "idling" [Paudel et al., Biophys. J. (2018) 114, 1499-1511], here we present evidence supporting a fundamentally different view: that DTPs are not single-cell states, but rather heterogeneous populations composed of multiple sub-states with distinct division and death rates that balance to produce near-zero net population growth. Using single-cell transcriptomics and lineage barcoding, we identify multiple phenotypic states within idling DTP populations, with reduced heterogeneity compared to untreated populations, and find that idling DTP cells emerge from nearly all lineages. Transcriptomic and functional analyses further reveal altered ion-channel activity in idling DTPs, which we confirm experimentally. Moreover, drug-response assays reveal increased susceptibility of idling DTPs to ferroptosis, a non-apoptotic form of regulated cell death, indicating the emergence of vulnerabilities associated with drug tolerance. Altogether, our results support a population-level view of tumor drug tolerance in which DTPs comprise stable collections of phenotypic states, shaped by treatment-defined phenotypic landscapes, which are potentially vulnerable to subsequent interventions. This perspective implies that eradicating DTPs will require a fundamental shift away from cell-type-centric strategies toward sequential treatments that progressively reduce phenotypic heterogeneity by modulating the molecular and cellular processes that establish the DTP landscape, an approach previously termed "targeted landscaping."

15
Genetic analysis of female genital tract polyps implicates genome stability, estrogen signalling and shared susceptibility with proliferative gynaecological disorders

Ingold, N.; Frankcombe, S.; Bouttle, K.; Moro, E.; Canson, D.; Zoellner, S.; Patil, S.; Dzigurski, J.; Glubb, D. M.; Laisk, T.; O'Mara, T. A.

2026-04-16 genetic and genomic medicine 10.64898/2026.04.13.26350740 medRxiv
Top 26%
0.3%
Show abstract

Female genital tract (FGT) polyps are common benign growths affecting up to half of all women. However, they carry malignant potential, and their genetic architecture remains poorly defined. We conducted a genome-wide association study (GWAS) meta-analysis across four biobanks (48,400 cases, 477,134 controls), identifying 26 risk loci for FGT polyps, 12 of which were previously unreported. Integrative gene prioritisation highlighted 193 candidate genes, revealing a potential convergent biological mechanism: where germline variation in DNA replication and maintenance (e.g., PRIM1, TERT and HMGA1) compromises genomic stability in the context of hormone-driven proliferation (e.g., ESR1 and GREB1). This susceptibility is further modulated by metabolic drivers of estrogen biosynthesis, underscored by specific adiposity-related loci (e.g. RSPO3 and PLCE1) and the aromatase gene CYP19A1. Mendelian randomisation demonstrated bidirectional causal relationships with endometriosis and fibroids, and endometrial cancer. Leveraging the shared genetic architecture of FGT polyps and other gynaecological disorders via multi-trait analysis revealed an additional 26 loci, validating sub-threshold regions encompassing HMGA1 and GREB1. In total, 52 risk loci were identified (36 novel), 39 of which replicated in an independent cohort. These findings reframe polyps not merely as local gynaecological overgrowths but as manifestations of a systemic proliferative syndrome characterised by dysregulated genome stability and estrogen signalling, which may also impact malignant transformation.

16
Uncovering the mechanisms of clinically-relevant altered antibiotic responses of Staphylococcus aureus under wound infection-mimetic conditions

Rieger, C. D.; Molaeitabari, A.; Dahms, T. E. S.; El-Halfawy, O. M.

2026-04-17 microbiology 10.64898/2025.12.22.696073 medRxiv
Top 27%
0.3%
Show abstract

Standard in vitro antimicrobial susceptibility testing (AST) using Mueller-Hinton broth (MHB) does not reflect infection-site conditions, and its results often do not correlate with therapeutic outcomes. Here, we compared the antibiotic susceptibility of methicillin-resistant Staphylococcus aureus (MRSA), a common chronic wound pathogen, in simulated wound fluid (SWF) resembling wound exudate versus MHB, revealing discordant AST results across six of nine tested antibiotic classes. The most significant were 128-fold increased resistance to tetracyclines and 256-fold sensitization to {beta}-lactams in SWF. Tetracycline resistance was mediated by MntC, an extracellular manganese-binding protein, whereas {beta}-lactam sensitization was driven by cell envelope remodelling in SWF. Galleria mellonella wound infection results matched the SWF susceptibility phenotypes, suggesting SWF better predicts in vivo wound infection therapeutic outcomes. These comprehensive phenotypic and mechanistic insights into MRSA antibiotic responses under wound-infection-mimetic conditions with direct in vivo validation identify a potential new antibiotic adjuvant target and may guide improved antibiotic therapy for MRSA wound infections.

17
Functional annotation of breast cancer risk loci implicates perturbation of FILIP1L expression in mammary fibroblasts in influencing breast cancer risk.

Zvereva, A.; Kemp, H.; Gillespie, A.; Tomczyk, K.; Romualdo Cardoso, S.; Sevgi, S.; Mackie, K.; Fedele, V.; Alexander, J.; Goulding, I.; Gomm, J.; Jones, J. L.; Baxter, J. S.; Pettitt, S. J.; Lord, C. J.; Fletcher, O.; Haider, S.; Johnson, N.

2026-04-10 genetic and genomic medicine 10.64898/2026.04.09.26350488 medRxiv
Top 29%
0.3%
Show abstract

Genome-wide association studies have led to the identification of more than 150 genomic regions that are associated with breast cancer risk. Translating these findings into a greater understanding of that risk requires identification of functional variants and target genes. Breast cancer progression and metastasis does not depend solely on cancer cell autonomous defects; the stroma, of which fibroblasts comprise a dominant component, also has a functional role. We generated promoter capture Hi-C data in primary and immortalized mammary fibroblasts and identified 28 interaction peaks involving 116 credible causal breast cancer variants and 26 target genes that were exclusive to fibroblasts. Integrating these data with H3K27ac CUT&Tag peaks identified a potentially functional variant (rs17393059) and target gene (filamin A interacting protein 1 like (FILIP1L)) at the 3q12.1 breast cancer risk locus. Using genome-wide functional data in breast-relevant cell types we demonstrate that perturbation of gene expression in mammary fibroblasts may impact risk of breast cancer by a cell non-autonomous mechanism.

18
VAE (Variational Autoencoder) Based Gastrotype Identification and Predictive Diagnosis of Helicobacter pylori Infection

Ma, Z.; Qiao, Y.

2026-04-13 gastroenterology 10.64898/2026.04.11.26350690 medRxiv
Top 30%
0.3%
Show abstract

Background: The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types. Methods: We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis. Results: Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993). Conclusions: The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.

19
Lamin B1 physically regulates neuronal migration by modulating nuclear deformability in the developing cortex

Shin, M.; Ishida, S.; Yu, J.; Iwashita, M.; Jang, G.-u.; Cortelli, P.; Giorgio, E.; Cani, I.; Ramazzotti, G.; Ratti, S.; Yoshino, D.; Rah, J.-C.; Imai, Y.; Kosodo, Y.

2026-04-17 neuroscience 10.1101/2025.10.22.683830 medRxiv
Top 30%
0.3%
Show abstract

Neuronal migration is a vital process that positions billions of neurons to create a functional brain. To navigate the constrained microenvironments within the cortex, precise control over the nuclear mechanics in migrating neurons is indispensable. Here, we show that Lamin B1 (LB1) regulates neuronal migration by modulating nuclear deformability. Excess LB1 in neurons halted migration without altering laminar identity or overall gene expressions in vivo, while in vitro, it elevated nuclear stiffness and impaired neuronal motility in confined spaces. Moreover, mispositioned neurons resulted in electrophysiological defects in the brain. Computational modeling predicted a temporal relationship between nuclear deformation and enhanced migration velocity, which was validated experimentally through live imaging. Notably, cerebral organoid assays using iPS cells established from patients with LMNB1 duplication exhibited impaired neuronal migration in a human model. Collectively, these findings demonstrate that LB1 is a critical regulator of nuclear mechanics, ensuring the accurate spatiotemporal positioning of neurons.

20
Drug response profiling guides precision therapy in relapsed and refractory childhood acute lymphoblastic leukemia

Steffen, F. D.; Lissat, A.; Alten, J.; Kriston, A.; Scheidegger, N.; Eckert, C.; Bodmer, N.; Schori, L.; Schühle, S.; Arpagaus, A.; Gutnik, S.; Manioti, D.; Bruderer, N.; Zeckanovic, A.; Västrik, I.; Nyiri, G.; Kovacs, F.; Thorhauge Als-Nielsen, B. E.; Attarbaschi, A.; Rademacher, A.; Elitzur, S.; Jacoby, E.; De Moerloose, B.; Svenberg, P.; Ancliff, P.; Sramkova, L.; Buldini, B.; Balduzzi, A.; Boer, J. M.; Mielcarek, M.; Ceppi, F.; Ansari, M.; Halter, J.; Schmiegelow, K.; Locatelli, F.; DelBufalo, F.; Stanulla, M.; Kulozik, A. E.; Schrappe, M.; Rohrlich, P.; Cave, H.; Baruchel, A.; von Stack

2026-04-11 oncology 10.64898/2026.04.08.26350164 medRxiv
Top 30%
0.3%
Show abstract

Children with relapsed or refractory acute lymphoblastic leukemia (ALL) require more effective and less toxic therapies. We established a prospective, multicenter Drug Response Profiling (DRP) registry (NCT06550102) integrating functional testing into precision-guided treatment. DRP was performed for 340 patients from 17 European countries with a turn-around time of two-weeks. Image-based drug screening with over 135000 unique perturbations revealed a heterogeneous landscape of ex vivo responses to 88 drugs on average. Ranking drug responses across the patient cohort defined individual drug fingerprints, identifying "DRP twins" by similarity in sensitivity and resistance independent of genetic ALL subtypes. Of 239 high-risk patients with follow-up, DRP-informed interventions were reported for 63 patients (26%). Patients received combination therapies based on venetoclax, tyrosine kinase inhibitors, trametinib, bortezomib or selinexor, resulting in objective clinical responses in 43 cases (68%). Precision-guided treatments allowed bridging to cellular therapies in 42 patients among whom 28 (67%) were still alive with a median follow-up of 21 months after DRP (IQR: 14.7-26.6 months). Top responders to venetoclax, ranked within the first tertile of the cohort, had superior 1-year event-survival compared to venetoclax non-responders (0.57 [95% CI, 0.39-0.85] vs. 0.25 [95% CI, 0.11-0.58]). Collectively, these findings demonstrate the feasibility and clinical relevance of functional profiling within an international network. This scalable framework enables individualized therapy selection for enrolment in adaptive precision trials for high-risk pediatric ALL.